Controlling Shared Memory

ABSTRACT

In view of the characteristics of distributed applications, the present invention proposes a technical solution for applying a shared memory on an NIC comprising: a shared memory configured to provide shared storage space for a task of a distributed application, and a microcontroller. Furthermore, the present invention provides a computer device that includes the above-mentioned NIC, a method for controlling a read/write operation on a shared memory of a NIC, and a method for invoking the NIC. The use of the technical solution provided in the present invention bypasses the processing of network protocol stack, avoids the time delay introduced by the network protocol stack. The present invention does not need to perform TCP/IP encapsulation on the data packet, thus greatly saving additional packet header and packet tail overheads generated from the TCP/IP layer data encapsulation.

CROSS-REFERENCE TO RELATED APPLICATION

This application claims priority under 35 U.S.C. §119 from ChinesePatent Application No. 201110047985.X filed Feb. 28, 2011, the entirecontents of which are incorporated herein by reference.

BACKGROUND OF THE INVENTION

1. Field of the Invention

The present invention generally relates to technologies for processingdata on a network interface card (NIC), and specifically to an NIC, acomputer device, a method for controlling a read/write operation on ashared memory on a NIC and a method for scheduling a NIC.

2. Description of Related Art

A distributed application refers to an application distributed ondifferent computer nodes and accomplishing a task together through anetwork. The task can be divided into a plurality of processes, anddifferent processes can be distributed on different computer nodes. Theplural processes need to invoke each other frequently, or perform pluralread/write operations on the same data. The processes of a distributedapplication distributed on different nodes usually perform networkcommunications using the TCP/IP protocol. TCP/IP is a generalcommunication protocol for supporting communications of almost all kindsof applications on the transmission layer/network layer. The TCP/IPprotocol has not provided customized protocol architecture fordistributed applications.

On traditional distributed computer architecture, an independent programbuffer for a distributed application is allocated on each computer node.Each process of the distributed program independently performsoperations on the program buffer, and performs data transmission througha TCP/IP network. According to the traditional architecture, data isrequired to go through multi-layered packaging before being transmittedthrough the network, as well as multi-layered decapsulation after beingtransmitted through the network. The above encapsulation anddecapsulation processes result in delays of multiple times during theentire data transmission process, and cause many unnecessary systemoverheads.

FIG. 1 illustrates schematic diagram of a system for performing datacommunication between two computer nodes in the prior art. Specifically,in the example shown in FIG. 1, computer node A requests to read a pieceof data from computer node B. The architecture of computer node Aincludes an application process A, a language runtime A, a networkprotocol stack A, a device driver module A, a NIC A and a program bufferA, the program buffer usually residing in the physical memory of thecomputer node. The computer node A can further include other devices notshown in FIG. 1, such as a CPU. The architecture of computer node B isidentical with that of computer node A.

In step S1, the application process A transmits a read data request tothe language runtime A through a dedicated programming interface; instep S2, the language runtime A converts the read data request into anetwork data transmission request, and passes it to the network protocolstack A for processing; in step S3, the network protocol stack A, afterperforming TCP/IP encapsulation on the data, invokes the device drivermodule A to initiate a direct memory access (DMA) operation of the NICA; in step S4, the NIC A copies the address of the program buffer A tothe NIC memory (not shown) on the NIC A through the DMA operation; instep S5, the NIC A transmits the content in its NIC memory to the NIC Bof the other computer node B; in step S6, the NIC B generates aninterrupt signal after receiving the data request packet from the NIC A,and informs the device driver module B; in step S7, the device drivermodule B copies the data request packet from the NIC memory of the NIC Bto the program buffer B; in step S8, the device driver module B informsthe network protocol stack B of the event of the arrival of the datarequest packet, and requests the network protocol stack B to parse thearrived data request packet; in step S9, by parsing the data requestpacket, the network protocol stack B learns that the content in the datarequest packet is a read data request, and informs the applicationprocess B by the language runtime B; in step S10, the applicationprocess B reads the data required by the computer node A, and constructsa network response notification, then invokes the language runtime Brequesting to transmit the data; in step S11, the language runtime Bpasses the network response notification to the network protocol stack Bto form a network data transmission request; in step S12, afterperforming TCP/IP protocol encapsulation on the data, the networkprotocol stack B invokes the device driver module B and indicates theaddress of the program buffer B in which the data to be transmitted islocated to initiate the NIC B to perform a DMA operation; in step S13,the NIC B copies the data from the program buffer B to the NIC memory onthe NIC B through the DMA operation; in step S14, the NIC B transmitsthe data to the NIC A on the computer node A; in step S15, the NIC Aforms an interrupt signal after receiving the data from network, andinforms the device driver module A; in step S16, the device drivermodule A copies the data from the NIC memory of the NIC A to the networkprotocol stack A; in step S17, the device driver module A informs thenetwork protocol stack A of the data arrival event to request thenetwork protocol stack A to parse the arrived data; in step S18, thenetwork protocol stack A learns that the content of the data packet is aresponse corresponding to the read data request by parsing the datapacket, and informs the application process A by the language runtime A,so as to make the application process A get the final result.

SUMMARY OF THE INVENTION

One aspect of the present invention provides a network interface card,including: a shared memory configured to provide shared storage fortasks of distributed applications, where said shared memory can beaccessed by a plurality of computing nodes executing a same task; and amicrocontroller configured to control read/write operations on saidshared memory.

Another aspect of the present invention provides a method forcontrolling a read/write operation on a shared memory of a networkinterface card, where the shared memory is configured to provide sharedstorage for tasks of a distributed application, and the shared memorycan be accessed by a plurality of computing nodes executing a same task,the method including: determining whether a local network interface cardis configured with a shared memory supporting said read/write operation;and performing the read/write operation to the shared memory on thelocal network interface card when the local network interface card isconfigured with the shared memory supporting the read/write operation.

Another aspect of the present invention provides a method for invoking anetwork interface card, the method including: providing a program bufferof a distributed application; invoking a language runtime through adedicated interface on the language runtime; invoking a device drivermodule to perform physical layer encapsulation; and controlling aread/write operation on the shared memory of the network interface card.

BRIEF DESCRIPTION OF THE DRAWINGS

The accompanying drawings referenced by this description are only usedto illustrate typical embodiments of the present invention, and shallnot be construed as limitation to the scope of the present invention.

FIG. 1 illustrates a schematic diagram of a system for performing datacommunication between two computer nodes in the prior art.

FIG. 2A illustrates a schematic diagram of an encapsulation structure ofa data frame transmitted on a network in the prior art.

FIG. 2B illustrates a schematic diagram of an encapsulation structure ofa data frame transmitted on a network according to an embodiment of thepresent invention.

FIG. 3 illustrates a schematic diagram of an internal structure of a NICin the prior art.

FIG. 4A illustrates a schematic diagram of the structure of an NICaccording to an embodiment of the present invention.

FIG. 4B illustrates a schematic diagram of the structure of an NICaccording to another embodiment of the present invention.

FIG. 5A illustrates a schematic diagram of a field structure of acommand port according to an embodiment of the present invention.

FIG. 5B illustrates a schematic diagram of the structure of a writeoperation instance of a command port according to an embodiment of thepresent invention.

FIG. 5C illustrates a schematic diagram of the structure of anallocation operation instance of a command port according to anotherembodiment of the present invention.

FIG. 6 illustrates a schematic diagram of the structure of an allocationtable in the shared memory according to an embodiment of the presentinvention.

FIG. 7A illustrates a schematic diagram of a physical layer data frametransmitted according to the RFC894 Ethernet network transmissionstandard.

FIG. 7B illustrates a schematic diagram of a physical layer data frametransmitted according to the RFC1042 Ethernet network transmissionstandard.

FIG. 8 illustrates a schematic diagram of a system for performing datatransmission between two computer nodes according to an embodiment ofthe present invention.

FIG. 9 illustrates a schematic diagram of a system for performing datatransmission between two computer nodes according to another embodimentof the present invention.

FIG. 10 illustrates a flowchart of a method for controlling a read/writeoperation on the shared memory of an NIC.

FIG. 11 illustrates a flowchart of a method for determining whetherthere is a locally configured shared memory supporting a read/writeoperation according to an embodiment of the present invention.

FIG. 12 illustrates a flowchart of a method for invocating an NICaccording to an embodiment of the present invention.

DETAILED DESCRIPTION OF THE PREFERRED EMBODIMENTS

Numerous specific details are provided in the following discussion tofacilitate thorough understanding of the present invention. However, itis obvious to those skilled in the art that the understanding of thepresent invention can not be affected without these specific details. Itshould be appreciated that the use of any of the following specificterms is only for the convenience of description, and therefore, thepresent invention should not be limited to be used in any specificapplication denoted and/or implied by such terms.

The inventor of the present invention finds that the steps S8, S9, S17and S18 have significant and unpredictable delay during the above dataread process. Therein, scheduling delay of the operating system willoccur in the steps S8 and S17; data receiving can be finished only whenthe operating system schedules the target application to run, and thisdelay is difficult to estimate, usually between 1-1000 milliseconds. Andin the steps S9 and S17, scheduling delay of the language runtime willoccur, and this can be up to several seconds in the worst case.

Besides the time delays, the whole data read process will result in alarge amount of network data packet head/tail overheads, since thenetwork protocol stack will be responsible for performing the TCP/IPencapsulation/decapsualtion on the data during the entire data readprocess. FIG. 2A illustrates a schematic diagram of the encapsulationstructure of a data frame transmitted on a network in the prior art. Inthe schematic diagram of FIG. 2A, it can be found that when beingtransmitted on the network, data go through three encapsulationprocesses of TCP, IP and physical layers. The TCP packet header and tailhave 20 bytes in total, the IP packet header and tail have 20 bytes intotal, and the physical packet header and tail have 26 bytes in total.Therefore, after being processed by the network protocol stack, if thedata packet transmitted on the network has only 1 byte of original data,the finally formed network layer data frame will have 67 (20+20+26+1)bytes.

For a distributed application, it needs to perform frequent transmissionand synchronization of shared data on several specific computer nodes,thus requiring performing frequent communication among the pluralcomputer nodes and performing transmission between the program bufferand NIC memory on each computer node. Obviously, for a task of adistributed application, its main objective is to make the applicationprocesses on the computing nodes to obtain consistent data views, and toensure the consistency of data update. TCP/IP protocol stack processingis not only unnecessary, but also brings additional processing andstorage overheads. In view of the above features of a distributedapplication, the present invention proposes a technical solution ofrealizing shared memory on a NIC.

Specifically, the present invention provides a device modeled after NIC,including: a shared memory configured to provide shared storage for adistributed application, and the shared memory can be accessed by aplurality of computing nodes running a same application, as well as amicrocontroller configured to control a read/write operation on theshared memory.

In addition, the present invention further provides a computer device,including: the above-described NIC, and a device driver moduleconfigured to perform physical layer encapsulation on the shared memoryon the above NIC.

Additionally, the present invention further provides a method forcontrolling read/write operations on a shared memory of a NIC, where theshared memory is configured to provide shared storage for a distributedapplication, and the shared memory can be accessed by a plurality ofcomputing nodes running the same application, the method including:determining whether the local NIC is configured with a shared memorysupporting the read/write operation, and performing the read/writeoperation on the shared memory on the local NIC if the local NIC isconfigured with a shared memory supporting the read/write operation.

Furthermore, the present invention provides a method for invocating anNIC, the method including: providing a program buffer for thedistributed application, invoking a language runtime through a dedicatedinterface on the language runtime, invoking a device driver module toperform physical layer encapsulation, and controlling a read/writeoperation on the shared memory on the NIC through a dedicated interfaceon the NIC, where the shared memory is configured to provide sharedstorage for a task of the distributed application, and the shared memorycan be accessed by a plurality of computing nodes running the sameapplication.

By using the technical solution of the present invention, the networkstack processing is bypassed, and the device driver module is invokeddirectly by the language runtime, thus the time delay brought by stepsS8, S9, S17, S18 in FIG. 1 can be avoided. In addition, since thepresent invention bypasses the network protocol stack processing, anddoes not need to perform TCP/IP encapsulation on the data packet andonly needs to perform physical layer encapsulation, the data packet canbe transmitted according to the existing physical layer transmissionmodel. This significantly saves the additional packet header and tailoverheads brought by the TCP/IP layer data encapsulation.

FIG. 3 shows a schematic diagram of an internal structure of a NIC inthe prior art. The NIC in the prior art includes control logics, an NICmemory, a DMA interface and a medium access unit, where the controllogics can be configured to be a programmable chip to control a DMAoperation, so as to realize data read/write to the NIC memory. The DMAinterface is connected externally to the bus of the computer node toperform a DMA operation. The medium access unit is responsible forreceiving data frames from the network and transmitting data frames inthe NIC memory to the network. The NIC memory is a temporary storageunit of the data frames.

FIG. 4A shows a schematic diagram of the structure of an NIC accordingto an embodiment of the present invention. The NIC in FIG. 4A is addedwith a shared memory and a microcontroller on the basis of the structureof a current NIC. Therein, the shared memory is configured to provideshared storage for a distributed application, and the shared memory canbe accessed by plural computing nodes running the same application. Themicrocontroller is configured to control a read/write operation on theshared memory. Furthermore, the microcontroller is configured todetermine whether the shared memory supports a received read/writerequest, and perform the read/write operation on the shared memory whenit determined that the shared memory supports the received read/writerequest. The microcontroller can be implemented by a simple FieldProgrammable Gate Array (FPGA), and not necessarily be designed as acomplex general processor.

The original control logic of the NIC communicates with the upper-layerdevice driver module through a specific port, and similarly, themicrocontroller can also communicate with the upper-layer device drivermodule through a specific port. Generally, the device driver modulecommunicating with the control logics and the device driver modulecommunicating with the microcontroller are different device drivermodules.

According to an embodiment of the present invention, the NIC memory canbe configured to buffer data read/written by the shared memory.According to another embodiment of the present invention, the sharedmemory directly performs the read/write operation with the programbuffer in the local physical memory without being buffered by the NICmemory. The specific buffering process will be described in detail inthe following.

According to an embodiment of the present invention, the NIC alsoincludes a command port CMD (not shown) thereon, the command port isconnected with the microcontroller and communicates with the upper-layerdevice driver module, and the command port is configured to receivecontrol commands to the microcontroller, so as to realize the read/writeoperation on the shared memory. Furthermore, the NIC can also include astate port STAT (not shown) thereon, and the state port is connectedwith the microcontroller and communicates with the upper-layer devicedriver module, and is responsible for providing the state result of thedata read/write operation on the shared memory on the NIC, so as to beread and checked by the device driver module. The command port and thestate port together realize the controlling of the controller.

In different architectures, different I/O primitives can be used toaccomplish port read/write. Taking the IA32 architecture as an example,assuming the port addresses of CMD and STAT are 0x4A0 and 0x4A1respectively, the following instructions issued by the driver programcan accomplish the port read/write:

MOV cmd_word, EAX # move the command word into the EAX register OUT0x4A0, EAX # write the command word into the CMD port IN EAX, 0x4A1 #read in state word from the STAT port

According to an embodiment of the present invention, in specifichardware implementation, the CMD port can be a 16-bit or 32-bit registerthat controls read/write of the on-chip storage module. FIG. 5Aillustrates a schematic diagram of a field structure of a command portaccording to an embodiment of the present invention. In the embodimentof FIG. 5A, the CMD port is implemented as a 32-bit register, whichincludes a TID field, a SIZE/KEY field, a UNIT field, an OP field and aFLAGS field.

Therein, the TID field at bits 24-31 is used to indicate a taskidentification code (TID), and the task identification code is used toindicate to which task of the distributed application the dataread/written belongs. A distributed application can include pluraltasks. The present invention can apply shared memory on the NIC for eachtask to perform data sharing, or apply shared memory on the NIC for oneor more tasks to perform data sharing. For the plural computer nodes,either the NIC of one computer node provides a common shared memory forall the tasks, or the NICs of different computer nodes provide sharedmemory for different tasks respectively.

The OP field at bits 8-11 is used to indicate the operation type, e.g.,0011 denotes to perform data read operation on the shared memory, 0100denotes to perform data write operation on the shared memory, 0001denotes to perform memory allocation operation on the shared memory,0010 denotes to perform memory release operation on the shared memory.Therein, the read operation and write operation are two basic operationsof data. The allocation operation is used to request allocating a pieceof memory space before the data read/write operation, and the releaseoperation is used to release a previously allocated memory space afterthe data read/write operation.

For different operation types, the meanings of the SIZE/KEY field of thecommand port bits 16-23 can also be different. For example, for theallocation operation, bits 16-23 store the size of the storage spacerequested to be allocated by the shared memory. Each allocated storagespace of the shared memory will be assigned a key KEY for identifying apiece of occupied storage space. For the write operation, the readoperation and the release operation, the key stored by bits 16-23 isused to denote the storage space of the shared memory to which the writeoperation, the read operation and the free operation correspond.

The UNIT field at bits 12-15 of the command port is used to denoteallocation granularity. In an allocation operation, since bits 16-23only have 8 bytes and the size of the storage space requested to beallocated by the shared memory that can be denoted thereby is limited,the UNIT field can realize an extension of the storage space size.According to an embodiment of the present invention, the UNIT field isused to denote a multiple of the filed SIZE. For example, for theallocation operation, if what is stored between bits 16-23 is 00000001,then when what is stored by bits 12-15 of the command port is 0001, itindicates that the storage space size of the shared memory requested tobe allocated is 1×1; when what is stored in bits 12-15 of the commandport is 0010, it indicates that the storage space size of the sharedmemory requested to be allocated is 1×2; when what is stored in bits12-15 of the command port is 0011, it indicates that the storage spacesize of the shared memory requested to be allocated is 1×3, and soforth. According to another embodiment of the present invention, theUNIT field is used to indicate plural multiplies of the field SIZE. Forexample, when what is stored in bits 12-15 is 0001, it indicates thatthe storage space size of the shared memory requested to be allocated is1×1; when what is stored in bits 12-15 is 0010, it indicates that thestorage space size of the shared memory requested to be allocated is1×8, and so forth.

Optionally, the FLAGS field at bits 0-7 of the command port is used todenote other control options, including whether the storage space of theallocated shared memory is allowed to be modified.

FIG. 5B illustrates a schematic diagram of the structure of a writeoperation instance of a command port according to an embodiment of thepresent invention. Therein, the TID field is 00000011, the KEY field is00000001, the UNIT field is 0001, the OP field is 0100, and the FLAGSfield is 00000000. FIG. 5B denotes a command port instance of executinga write data operation on a piece of storage space with a shared memorykey of 1 through the CMD port for a task with a task identifier TID of3.

FIG. 5C shows a schematic diagram of the structure of an allocationoperation instance of a command port according to another embodiment ofthe present invention. Therein, the TID field is 00000011, the SIZEfield is 00001010, the UNIT field is 0001, the OP field is 0001, and theFLAGS field is 00000000. FIG. 5C denotes that, for a task with a taskidentifier 3, it requests the shared memory to allocate a storage spaceof 10 bytes of shared memory through the CMD port.

The structural design of the CMD port can be adjusted according to thedifference of specific applications, and not limited to the above-listedinstances.

The structural design of the STAT port can also be adjusted according tothe difference of specific applications. In an embodiment of the presentinvention, the structure of the STAT port includes a KEY field and a TIDfield to indicate the execution status of a read/write command, e.g.,whether a read/write operation is executed successfully, or whether aread/write operation invokes a remote computer node. This invention doesno specifically define the format of the STAT port.

FIG. 6 illustrates a schematic diagram of the structure of an allocationtable in the shared memory according to an embodiment of the presentinvention. In order to perform effective control on the shared memory inthe NIC, according to an embodiment of the present invention, anallocation table is maintained in the shared memory. The allocationtable records the tasks supported by the shared memory. Specifically,the allocation table in FIG. 6 includes a TID field, a KEY field, anADDR field, a LEN field and a FLAGS field, where the TID field recordsthe task identification code supported by the shared memory, the KEYfield records the key to which the storage space allocated by the sharedmemory for the corresponding task, the ADDR field records the startaddress of the shared memory to which the key corresponds, the LEN fieldrecords the size of the storage space of the shared memory to which thekey corresponds, and the FLAGS field records other related information.Thus, the microcontroller can, by querying the allocation table, learnwhether the local shared memory has allocable space and supports theread/write operation required by the device driver module.

According to an embodiment of the present invention, the NIC can befurther configured with a state switch thereon, which indicates whetherthe local NIC is configured with a shared memory which in an enabledworking state. The microcontroller can be further configured todetermine whether the local NIC is configured with a shared memorymodule according to the state switch. Moreover, the state in the stateswitch can be altered to denote whether the shared memory on the localNIC is in an enabled or disabled working state.

Furthermore, if the local NIC is configured with a shared memory, themicrocontroller can determine whether the shared memory configured onthe local NIC supports a certain read/write operation according to thetask identification code TID in the above allocation table. Thisfunction is especially useful for the case where the NICs of pluralcomputer nodes are all configured with different shared memories so asto support different distributed program tasks, by which amicrocontroller can determine whether the shared memory configured onthe local NIC is the shared memory to which a certain read/writeoperation is directed.

In order to indicate that the write operation is to write data into theshared memory rather than the NIC memory, or to indicate that the readoperation is to read data out from the shared memory rather than the NICmemory, the device driver module writes a special identifier in theframe structure of the physical layer data while performing physicallayer data encapsulation so as to indicate the packet is targeted at theshared memory. In an embodiment of the present invention, the specialidentifier is recorded in the type field of the physical layer packetheader of the data. FIG. 7A illustrates a schematic diagram of aphysical layer data frame transmitted according to the RFC894 Ethernettransmission standard. FIG. 7B illustrates a schematic diagram of aphysical layer data frame transmitted according to the RFC1042 Ethernettransmission standard. Both transmission standards include a two-bytetype field. The device driver module first sets the frame type of thephysical layer network frame to identify the difference with a commonnetwork data packet before it writes data to the shared memory. Forexample, generally the type field is 0x0800 to indicate that an IPpacket is carried in the data frame. The present invention candistinguish a shared memory read/write packet from a network data packetby setting the type field to 0x00FF. Other embodiments of the presentinvention can use other fields of the physical layer frame structurepacket header to record the special identifier, or change the physicallayer frame structure to add a special identifier field.

As described above, in the present invention, either a common sharedmemory is provided for all tasks by the NIC of only one computer node inthe plural computer nodes, or different shared memories are provided fordifferent tasks by the NICs of different computer nodes. In the formerembodiment, the NIC of only one computer node among the plural computernodes is configured with both the shared memory and the microcontroller,while the NICs of other computer nodes are merely configured with amicrocontroller to realize controlling the read/write operation on theremote shared memory. FIG. 4B illustrates a schematic diagram of an NICstructure according to another embodiment of the present invention. TheNIC of FIG. 4B includes control logics, a NIC memory, a media accessunit, a DMA interface and a microcontroller. Different from FIG. 4A, theNIC in FIG. 4B does not include a shared memory. The microcontroller inFIG. 4B is only used to provide controlling logics to the shared memoryof a remote computer node, and not used to control the local sharedmemory. The specific details will be described in more detail below.

In the following is described different data flows in the above twoembodiments with respect to the architecture of the computer nodes.

Embodiment 1—only one computer node in the plural computer nodes isconfigured with a shared memory:

Taking FIG. 8 as an example, the NIC A of computer node A is configuredwith a shared memory A, while the NIC B of computer node B is notconfigured with a share memory.

Embodiment 1.1—a read/write operation is issued by the applicationprocess of the computer node configured with a shared memory:

Taking FIG. 8 as an example, assuming that the application process A ofcomputer node A issues a data read/write request.

Embodiment 1.1.1—the issued read/write request is a write data request:

Taking FIG. 8 as an example, assuming that the application process Aissues a write data request requesting to write a piece of data in theprogram buffer A into the shared memory. For the application, it istransparent whether data is stored using the shared memory.

First, the application process A provides the address of the programbuffer A of the distributed application, the data to be written into theshared memory being stored in the program buffer A. The applicationprocess A invokes the language runtime A using a dedicated interface ofthe language runtime A to perform data writing. The language runtime Ainvokes the device driver module A to encapsulate the data into aphysical layer data frame, including encapsulating the packet header andpacket tail of the physical layer data frame. Therein the device drivermodule A is a device driver module dedicated to perform shared memoryoperations. Besides, the computer node A further includes a devicedriver module (not shown) corresponding to the controller in the NIC A,i.e., a device driver module used in a traditional NIC. Next, the devicedriver module A will invoke the traditional device driver module, so asto copy the data from the program buffer A to the NIC memory (not shown)of the NIC A. As a variation of the above embodiment, the physical layerencapsulation can also be performed by a traditional device drivermodule.

Next, the microcontroller A determines whether the local NIC isconfigured with a shared memory according to the state switch on the NICA. In this embodiment, the microcontroller A determines that the localNIC has the shared memory. Since only one computer node in the pluralcomputer nodes is configured with the shared memory, the object uponwhich the above data write request is targeted is exactly the sharedmemory on the local NIC.

Next, the microcontroller A copies the data packet from the NIC memoryon the NIC A into the shared memory A. The above step has variousimplementations, one of which is to remove the packet header and packettail of the data packet, and copy the effective data part therein intothe shared memory A. In another implementation, the entire data packetis copied into the shared memory A.

To support data sharing, the storage capacity of the shared memory isusually very large, and much larger than the storage capacity of the NICmemory. In this case, if a write operation of a large amount of data isperformed, it can be impossible to copy all the data to be written intothe NIC memory. Therefore, it is needed to partition a large bulk ofdata, so that the partitioned data can be written into the shared memoryB piece by piece from the program buffer A through the NIC memory. Alldata read from the program buffer A are finally written into the sharedmemory A through buffering of the NIC memory.

As a variation of the present embodiment, the device driver module A cancopy data from the program buffer A into the shared memory A directly bywriting to the command port of the microcontroller A. For example, theKEY field in the command port can be used to describe the address of thedata to be written in the program buffer A, by which the microcontrollercan be able to control copying the data at the address from the programbuffer A to the shared memory A. The above manner can realize a directdata exchange between the program buffer and the shared memory, but canalso bring additional control overheads.

Embodiment 1.1.2—the issued read/write request is a read data request:

Taking FIG. 8 as an example, assume that the application process Aissues a read request, requesting to read data from the shared memoryinto the program buffer A.

First, the application process A provides the address of the programbuffer A for receiving the data. The application process A invokes thelanguage runtime A using a dedicated interface of the language runtimeA. The language runtime A invokes the device driver module A toencapsulate the address of the program buffer A to a simple physicallayer data frame. The device driver module A invokes a traditionaldevice driver module (not shown) corresponding to the NIC memory in theNIC A, so as to copy the address of the program buffer A to the NICmemory (not shown) of the NIC A.

Next, the microcontroller A determines whether the local NIC isconfigured with a shared memory according to the state switch on the NICA. In the present embodiment, the microcontroller A determines that thelocal NIC has the shared memory. Since only one computer node in theplural computer nodes is configured with the shared memory, the objectupon which the above data read request is executed is exactly the sharedmemory on the local NIC.

Next, the microcontroller A copies the data from the shared memory A tothe NIC memory of the NIC A, and then the controller of the NIC A copiesthe data from the NIC memory to the program buffer A according to theaddress of the program buffer A stored in the NIC memory.

As a variation of the present embodiment, the device driver module A canalso directly copy the data from the shared memory A to the programbuffer A by writing to the command port of the microcontroller A. Inthis way, the structure of the command port should be added with a fieldof program buffer address to indicate the address into which the data isto be written.

Embodiment 1.2—a read/write request is issued by the application processof a computer node not configured with a shared memory:

Taking FIG. 8 as an example, assume that the application process B ofthe computer node B issues a data read/write request.

Embodiment 1.2.1—the issued read/write request is a write data request:

Taking FIG. 8 as an example, assume that the application process Bissues a write data quest, requesting to write a piece of data in theprogram buffer B into the share memory A.

First, the application process B provides the address of the programbuffer B, in which the data to be written into the shared memory isstored, of the distributed application. The application process Binvokes the language runtime B by a dedicated interface on the languageruntime B to perform data partition. The language runtime B invokes thedevice driver module B to encapsulate the data into a data frame of thephysical layer, where the device driver module B is a device drivermodule dedicated to perform shared memory operations. Besides, thecomputer node B further includes a device driver module (not shown)corresponding to the controller in the NIC B, i.e., a device drivermodule used in a traditional NIC. Next, the device driver module B willinvoke the traditional device driver module, so as to copy the data fromthe program buffer B to the NIC memory (not shown) of the NIC B. As avariation of the above embodiment, the physical layer encapsulation ofthe data can also be performed by the traditional device driver module.

Next, the microcontroller B determines whether the local NIC isconfigured with a shared memory according to the state switch on the NICB. In this embodiment, the microcontroller B determines that the localNIC does not have a shared memory thereon.

Next, the microcontroller B invokes the controller on the NIC B, so asto transmit the data in the NIC memory of the NIC B to other computernodes through the medium access unit (not shown), the other computernodes being computer node A in the present embodiment.

Then, the NIC A of computer node A receives the data and copies it tothe shared memory A. Specifically, after the N IC A receives the data,the controller of the NIC A determines whether the data is the data tobe written into the shared memory by querying a special identifier inthe data frame, e.g., a type field. If the data is to be written intothe shared memory, then the microcontroller A further determines whetherthere is a locally configured share memory according to the state switchon its own NIC. In the present embodiment, the microcontroller Adetermines that a shared memory A is configured locally. Since only onecomputer node in the plural computer nodes is configured with a sharedmemory, the object upon which the above data write request is executedis exactly the shared memory on the NIC A. Next, the microcontroller Awrites the data into the local shared memory A.

It should be pointed out that, a data frame transmitted in a network caninclude a task identifier field TID and a key field KEY. The TID and KEYcan be recorded in the data fields shown in FIG. 7A, 7B, and the valuesof TID and KEY will be recorded in the allocation table of the sharedmemory. As for the structure of the allocation table, refer to the abovedescription with respect to FIG. 6.

Embodiment 1.2.2—the issued read/write request is a read data request:

Taking FIG. 8 as an example, assume the application process B issues aread data request, requesting to read data from the shared memory intothe program buffer B.

First, the application process B provides the address of the programbuffer B for receiving data. The application process B invokes thelanguage runtime B using a dedicated interface of the language runtimeB. Then, the language runtime B invokes the device driver module B toencapsulate the address of the program buffer B into a physical layerdata frame. Moreover, the device driver module B invokes a traditionaldevice driver module (not shown) corresponding to the NIC memory in NICB, so as to copy the address of the program buffer B into the NIC memory(not shown) of the NIC B.

Next, the microcontroller B determines whether a shared memory isconfigured locally according to the state switch on the NIC B. In thepresent embodiment, the microcontroller B determines that there is noshared memory on the local NIC, and then the microcontroller B forwardsthe data read request to other computer nodes or simply neglects thisdata read request, the other computer node being computer node A in thepresent embodiment.

Next, the controller in the NIC A can determine whether the requesteddata is data stored in the shared memory by checking the field type inthe data frame. If the conclusion is yes, the microcontroller A furtherdetermines whether there is a shared memory configured locally. If thefurther conclusion is yes, the microcontroller A parses the data readrequest and constructs the data in the shared memory A into a physicallayer data frame. Then, the NIC A transmits the data to the computernode B. After receiving the data, the computer node B copies the data tothe program buffer B under the control of the controller of NIC B, thusaccomplishing the data read operation.

Embodiment 2—the plural computer nodes are all configured with sharedmemories to support different tasks:

Taking FIG. 9 as an example, the NIC A of the computer node A isconfigured with a shared memory A, and the NIC B of the computer node Bis configured with a shared memory B. The shared memory A and the sharedmemory B can be used to support different tasks. Hereinafter, only theparts of the embodiment 2 different from embodiment 1 are described indetail, while the parts identical with the embodiments in embodiment 1are merely described briefly.

Embodiment 2.1—a read/write request is issued to the local sharedmemory:

Taking FIG. 9 as an example, assume that the application process A ofthe computer node A issues a data read/write request, requesting toperform a read/write operation on the shared memory.

Embodiment 2.1.1—the issued read/write request is a write data request:

Taking FIG. 9 as an example, assume that application process A issues awrite data request, requesting to write a piece of data in the programbuffer A into the shared memory A.

First, the application process A provides the address of the programbuffer A of the distributed application. The application process Ainvokes the language runtime A, which in turn invokes the device drivermodule A to encapsulate the data into a physical layer data frame. Next,the device driver module A will invoke the traditional device drivermodule (not shown) of the NIC A, so as to copy the data from the programbuffer A into the NIC memory (not shown) of the NIC A.

Next, the microcontroller A determines whether the local NIC isconfigured with a shared memory according to the state switch on the NICA. In the present embodiment, the microcontroller A determines that thelocal NIC is configured with a shared memory. Next, the microcontrollerA further determines whether the shared memory A configured on the localNIC supports the write operation, i.e., whether data is to be writteninto this instance of shared memory A instead of the shared memories onother computer nodes, according to the task identification code TIDstored in the allocation table of the shared memory A. Specifically, themicrocontroller A can determine whether the write operation is to beperformed on the local shared memory by comparing the taskidentification code TID in the allocation table with the TID field ofthe write command obtained by the command port of the NIC A. In thepresent embodiment, the microcontroller A determines that the writeoperation is to be performed on the local shared memory A.

Next, the microcontroller A copies the data from the NIC A memory on theNIC memory A into the shared memory A.

Embodiment 2.1.2—the issued read/write request is a read data request:

Taking FIG. 9 as an example, assume the application process A issues aread data request, requesting to read the data from the shared memoryinto the program buffer A.

First, the application process A provides the address of the programbuffer A for receiving data. The application process A invokes thelanguage runtime A. Next, the language runtime A invokes the devicedriver module A to encapsulate the address of the program buffer A intoa simple physical layer data frame. The device driver module A invokes atraditional device driver module (not shown) corresponding to the NICmemory in the NIC A so as to copy the address of the program buffer Ainto the NIC memory (not shown) of the NIC A.

Next, the microcontroller A determines whether the local NIC isconfigured with a shared memory according to the state switch on the NICA. In the present embodiment, the microcontroller A determines thatthere is a shared memory on the locale NIC. Next, the microcontroller Afurther determines whether the shared memory A configured on the localNIC supports the read operation, i.e., whether the data is to be readfrom the shared memory A instead of the shared memories on othercomputer nodes, according to the task identification code TID stored inthe allocation table of the shared memory A.

Then, the microcontroller A copies the data from the shared memory Ainto the NIC memory of the NIC A, and the controller of the NIC A copiesthe data from the NIC memory to the program bufferA according theaddress of the program bufferA stored in the NIC memory, thusaccomplishing the data read operation.

Embodiment 2.2—a read/write request is issued to a remote shared memory:

Taking FIG. 9 as an example, assume the application process A of thecomputer node A issues a data read/write operation, requesting toperform read/write operation on a shared memory located at a remote nodeB.

Embodiment 2.2.1—the issued read/write request is a write data request:

Taking FIG. 9 as example, assume that the application process A issues awrite data request, requesting to write a piece of data in the programbuffer A into the shared memory B.

First, the application process A provides the address of the programbuffer A, in which the data to be written into the shared memory isstored, of the distributed application, and the application process Ainvokes the language runtime A. Then, the language runtime A in turninvokes the device driver module A to encapsulate the data to a physicallayer data frame. Next, the device driver module A will invoke atraditional device driver module (not shown) corresponding to thecontroller in the NIC A, so as to copy the data from the program bufferA into the NIC memory (not shown) on the NIC A.

Next, the microcontroller A determines whether the local NIC isconfigured with a shared memory according to the state switch on the NICA. In the present embodiment, the microcontroller A determines thatthere is a shared memory on the local NIC. Next, the microcontroller Afurther determines whether the shared memory A configured on the localNIC supports the write operation, i.e., whether the data is to bewritten into the shared memory A in stead of the shared memories onother computer nodes, according to the task identification code TIDstored in the allocation table of the shared memory A. In the presentembodiment, the application process A will write the data into theshared memory B.

Then, the microcontroller A invokes the controller on the NIC A, so asto transmit the data in the NIC memory of the NIC A to the computer nodeB through a medium access unit (not shown).

Then, the NIC B of the computer node B receives the data and copies thedata to the shared memory B. Specifically, after the NIC B has receivedthe data, the controller of the NIC B determines whether the data is thedata to be written into the shared memory by querying a specialidentifier in the data frame, e.g., a type field. If the data is to bewritten into the shared memory, then the microcontroller B furtherdetermines whether a shared memory is provided locally according to thestate switch on its own NIC. In the present embodiment, themicrocontroller B determines that there is a shared memory B configuredlocally. Next, the microcontroller B further determines whether theshared memory B configured on the local NIC supports the writeoperation, i.e., whether the data is to be written into the sharedmemory B instead of the shared memories on other computer nodes,according to the task identification code TID stored in the allocationtable of the shared memory B. If the conclusion is yes, themicrocontroller B writes the data into the local shared memory.

Embodiment 2.2.2—the issued read/write request is a read data request:

Taking FIG. 9 as an example, assume the application process A issues adata read request, requesting to read data from the shared memory intothe program buffer A.

First, the application process A provides the address of the programbuffer A for receiving data. The application process A invokes thelanguage runtime A. Then, the language runtime A invokes the devicedriver module A to encapsulate the address of the program buffer A to asimple physical layer data frame. The device driver module A invokes atraditional device driver module (not shown) corresponding to the NICmemory in the NIC A, so as to copy the address of the program buffer Ainto the NIC memory (not shown) of the NIC A.

Next, the microcontroller A determines whether a shared memory isconfigured locally according to the state switch on the NIC A. In thepresent embodiment, the microcontroller A determines that there is ashared memory on the local NIC. Next, the microcontroller A furtherdetermines whether the shared memory A configured on the local NICsupports the read operation, i.e., whether the data is to be read fromthe shared memory A instead of the shared memories on the other computernodes, according to the task identification code TID stored in theallocation table of the shared memory A. In the present embodiment, theapplication process A is to read the data from the shared memory B.

Then, the NIC A transmits the data read request to the NIC B. Next, thecontroller in the NIC B can determine whether it is a read/write requestto the shared memory, i.e., whether the requested data is data stored inthe shared memory, according to the type field in the data frame. If theconclusion is yes, the microcontroller B further determines whetherthere is a shared memory configured locally. If the further conclusionis yes, the microcontroller B parses the data read request andconstructs the data in the shared memory into a physical layer networkframe. Thereafter, the NIC B transmits the data along with the physicallayer network frame to the computer node A. After receiving the data,the computer node A copies the data to the program buffer A under thecontrol of the controller of the NIC A, thus accomplishing the data readoperation.

The above various embodiments described in conjunction with FIGS. 8 and9 only schematically describe some steps of the related read/writeoperations in the present invention, and more detailed steps aboutmemory allocation and release have been described generally above, andwill not be repeated here.

FIG. 10 illustrates a flowchart of a method for controlling a read/writeoperation on a shared memory of an NIC. In step 1001, it is determinedwhether a shared memory supporting the read/write operation isconfigured locally. In step 1003, if the shared memory supporting theread/write operation is configured locally, the read/write operation isperformed on the local shared memory. The detailed process of performinga read/write operation has been described above, and will not berepeated here.

Furthermore, when there is no shared memory supporting the read/writeoperation configured locally, a remote shared memory is requested toperform the read/write operation. If the read/write operation is a writeoperation, then a physical layer encapsulation is further performed onthe data to be written, and the steps for requesting a remote sharedmemory to perform the read/write operation further includes transmittingthe encapsulated data to be written to the remote shared memory. If theread/write operation is a read operation, then perform a physical layerencapsulation on the address of the program buffer into which the datato be read is to be written, so as to be part of the data read requesttransmitted to the remote, and the step for requesting the remote sharedmemory to perform the read/write operation further includes transmittingthe encapsulated data read request to the remote shared memory. Thedetailed process of the above operations has been described above andwill be omitted here.

FIG. 11 illustrates a flowchart of a method for determining whether ashared memory supporting the read/write operation is configured locallyaccording to an embodiment of the present invention. First, in step1101, it is determined whether the local NIC is configured with a sharedmemory according to the state switch on the NIC. Then in step 1103, ifthe local NIC is configured with a shared memory, it is furtherdetermined whether the shared memory configured on the local NICsupports the read/write operation according to the task identificationcode stored in the shared memory. More detailed descriptions of theabove two determination steps have been described above and will beomitted here.

FIG. 12 illustrates a flowchart of a method for invocating an NICaccording to an embodiment of the present invention. In step 1201, aprogram buffer of the distributed application is provided; in step 1203,a language runtime is invoked by a dedicated interface on the languageruntime; in step 1205, the device driver module is invoked to performphysical layer encapsulation; in step 1207, the read/write operation onthe shared memory of the NIC is controlled using the above method. Moredetailed description of the above multiple steps have been describedabove and will not be repeated here.

It will be appreciated by those skilled in the art that, unlessexplicitly stated, the present invention can be implemented as a system,method or computer program product. Therefore, unless explicitly stated,the present invention can be implemented in the following forms, i.e.,complete hardware, complete software (including firmware, residentsoftware, microcode, etc.), or a combination of software part andhardware part which is generally called “circuit”, “module” or “system”herein. Furthermore, the present invention can be implemented in theform of a computer program product embodied in any tangible medium ofexpression, which medium includes computer usable program code.

Any combination of one or more computer usable or computer readablemediums can be used. A computer usable or computer readable medium canbe, for example, but not limited to, electric, magnetic, optical,electromagnetic, infrared or semi-conductive system, apparatus, deviceor transmission medium. More specific examples of a computer readablemedium (a non-exhaustive list) include the following: electricconnection with one or more wires, portable computer disk, hard disk,random access memory (RAM), read-only memory (ROM), erasableprogrammable read only memory (EPROM or flash), optical fiber, portablecompact disk read only memory (CD-ROM), optic storage device,transmission medium supporting, for example, internet or internalnetwork, or magnetic storage device. It should be noted that a computerusable or computer readable medium can even be papers or other mediumson which programs are printed, because, by electrically scanning thepapers or other medium, for example, the program can be obtained in anelectrical manner, and can be compiled, interpreted or processed in aproper way, and stored in a computer memory if necessary. In the contextof this specification, a computer usable or computer readable medium canbe any medium that contains, stores, conveys, propagates or transmitsprograms to be used by or associated with an instruction executionsystem, device or apparatus. The computer usable medium can include datasignals transmitted in a baseband or as part of a carrier, and embodyingcomputer usable program code. The computer usable program code can betransmitted through any appropriate mediums, including but not limitedto wireless, cable, optical fiber, RF.

The computer program code for performing the operations of the presentinvention can be written in any combination of one or more programminglanguages, which include object-oriented programming languages such asJava, Smalltalk, C++, as well as conventional procedural programminglanguages such as “C” programming language or similar programminglanguages. The program code can be executed entirely on a user computer,or executed partially on a user's computer, or executed as anindependent software package, or executed partially on a user computerand partially on a remote computer, or executed entirely on a remotecomputer or server. In the latter case, the remote computer can beconnected to the user's computer through any kind of networks, includinglocal area network (LAN) or wide area network (WAN), or can be connectedto external computers (e.g., through the Internet using an internetservice provider).

The present invention is described above by referring to the flowchartsand/or block diagrams of the method, apparatus (system) and computerprogram product according to embodiments of the present invention. Itshould be appreciated that, each block of the flowcharts and/or blockdiagrams and the combination of the blocks in the flowcharts and/orblock diagrams can be implemented by computer program instructions,which can be provided to a general-purpose computer, a dedicatedcomputer or processors of other programmable data processing devices, soas to produce a machine, which enables to produce an apparatus forimplementing the functions/operations specified in the blocks of theflowcharts and/or block diagrams through executing the instructions bythe computer or other programmable data processing device.

The computer program instructions can also be stored in a computerreadable medium that is capable of instructing a computer or otherprogrammable data processing devices to operate in a specific way, bywhich the instructions stored in the computer readable medium produce amanufactured product including instruction means for implementing thefunctions/operations specified in the blocks of the flowcharts and/orblock diagrams.

The computer program instructions can also be loaded in a computer orother programmable data processing devices to enable the computer orother programmable data processing devices to perform a series ofoperation steps, to produce computer-implemented processes, so that theinstructions executed on the computer or other programmable dataprocessing devices provide a process of implementing thefunctions/operations specified in the blocks of the flowcharts and/orblock diagrams.

The flowcharts and block diagrams in the accompanying drawingsillustrate the architectures, functions or operations that can beimplemented according to the system, method or computer program productsof the various embodiments of the present invention. In this regard,each block in the flowcharts or the block diagrams represents a module,a program segment or part of the code, said module, program segment orpart of the code includes one or more executable instructions forimplementing the specified logic functions. It should be also notedthat, in some alternative implementations, the functions indicated inthe blocks can occur in a different order from that is indicated in theblocks. For example, two blocks illustrated consecutively can actuallybe performed in parallel substantially, and sometimes can also beperformed in a reverse order, which depends on the functions involved.It should also be noted that, each block in the block diagrams and/orthe flowcharts and the combination of blocks in the block diagramsand/or flowcharts can be implemented by a dedicated hardware-basedsystem that perform specified functions or operations, or can beimplemented by a combination of dedicated hardware and computerinstructions.

The terminology used herein is only for describing specific embodiments,and not intended to limit the present invention. The singular forms of“one” and “the” used herein are intended to include plural forms, unlessexplicitly stated otherwise in the context. It should also beappreciated that, when the word “include” is used herein, it means theexistence of the indicated features, entities, steps, operations, unitsand/or components, but does not exclude the existence or addition of oneor more other features, entities, steps, operations, units and/orcomponents, and/or the combination thereof.

Equivalent alternatives of the corresponding structures, materials,operations and all the functionally defined means or steps in the claimsare intended to include any structures, materials or operations forexecuting the functions in combination with other units specificallystated in the claims. The objective of the given description of thepresent invention is to illustrate and describe, and not exhaustive, norto limit the present invention to the described forms. For those ofordinary skill in the art, it is obvious that can modifications andvariations can be made without departing from the scope and sprit of thepresent invention. The selection and description of the embodiments arefor the purpose of best explaining the principles and actual applicationof the present invention, so that those of ordinary skill in the art canunderstand that the present invention can have various implementationswith all kinds of variations suitable for the desired specific purposes.

1. A network interface card, comprising: a shared memory configured toprovide shared storage for tasks of distributed applications, whereinsaid shared memory can be accessed by a plurality of computing nodesexecuting a same task; and a microcontroller configured to controlread/write operations on said shared memory.
 2. The network interfacecard of claim 1, wherein said microcontroller is further configured to:determine whether said shared memory supports a received read/writerequest; and perform the read/write operation to said shared memory whensaid shared memory supports the received read/write request.
 3. Thenetwork interface card of claim 1, further comprising: a command port,wherein (i) said command port is connected with said microcontroller,and (ii) said command port is configured to transmit a control commandto the microcontroller.
 4. The network interface card of claim 3,wherein said control command includes fields for controlling said sharedmemory to perform one of the following operations: read operation, writeoperation, allocation operation and release operation.
 5. The networkinterface card of claim 1, further comprising: a state switch; and saidmicrocontroller being further configured to determine whether a sharedmemory is configured on said network interface card according to saidstate switch.
 6. The network interface card of claim 5, wherein a taskidentification code is stored in said shared memory, and wherein if ashared memory is configured on said network interface card, then saidmicrocontroller is further configured to determine whether said sharedmemory supports the received read/write request according to said taskidentification code.
 7. The network interface card of claim 1, whereinthe frame structure of data stored in said shared memory includes anidentifier to indicate that said data is targeted at said shared memory.8. The network interface card of claim 7, wherein said identifier isrecorded in a type field of a physical layer header of said data.
 9. Thenetwork interface card of claim 1, further comprising: a networkinterface card memory configured to buffer the data read/written by saidshared memory.
 10. The network interface card of claim 1, wherein saidshared memory is further configured to directly perform the read/writeoperation with a program buffer outside said network interface card. 11.A computer device, comprising the network interface card of claim
 1. 12.A method for controlling a read/write operation on a shared memory of anetwork interface card, wherein said shared memory is configured toprovide shared storage for tasks of a distributed application, and saidshared memory is accessed by a plurality of computing nodes executing asame task, said method comprising: determining whether a local networkinterface card is configured with a shared memory supporting saidread/write operation; and performing the read/write operation to theshared memory on the local network interface card when the local networkinterface card is configured with the shared memory supporting saidread/write operation.
 13. The method of claim 12, wherein the step ofdetermining whether the local network interface card is configured withthe shared memory supporting said read/write operation comprises:determining whether the local network interface card is configured withthe shared memory according to a state switch on the network interfacecard.
 14. The method of claim 13, wherein the step of determiningwhether the local network interface card is configured with the sharedmemory supporting said read/write operation further comprises:determining whether the shared memory configured on the local networkinterface card supports the read/write operation according to a taskidentification code stored on the shared memory on the local networkinterface card if the local network interface card is configured withthe shared memory.
 15. The method of claim 12, further comprising:requesting a shared memory on a remote network interface card to performthe read/write operation when there is no shared memory supporting saidread/write operation configured on the local network interface card. 16.The method of claim 15, wherein if said read/write operation is a writeoperation, then said step of requesting a shared memory on a remotenetwork interface card to perform the read/write operation furthercomprises: transmitting data to be written to the shared memory on theremote network interface card.
 17. The method of claim 15, wherein ifthe read/write operation is a read operation, then said step ofrequesting a shared memory on a remote network interface card to performthe read/write operation further comprises: transmitting a data readrequest to the shared memory on the remote network interface card. 18.The method of claim 12, wherein said network interface card furthercomprises a network interface card memory, and wherein the step ofperforming the read/write operation on the shared memory on the localnetwork interface card further comprises: performing the read/writeoperation on the shared memory on the local network interface cardthrough a buffer of the memory of the local network interface card. 19.The method of claim 12, wherein the step of performing the read/writeoperation on the shared memory on the local network interface cardfurther comprises: directly performing said read/write operation betweenthe shared memory on the local network interface card and a programbuffer outside the local network interface card.
 20. The method of claim12, further comprises: performing physical layer encapsulation on thedata to be written if said read/write operation is a write operation.21. The method of claim 12, further comprises: performing physical layerencapsulation on a program buffer address of the program buffer intowhich the data to be read out is to be written if said read/writeoperation is a read operation.
 22. A method for invoking a networkinterface card, the method comprising: providing a program buffer of adistributed application; invoking a language runtime through a dedicatedinterface on the language runtime; invoking a device driver module toperform physical layer encapsulation; and controlling a read/writeoperation on a shared memory of the network interface card by a methodcomprising: determining whether a local network interface card isconfigured with a shared memory supporting said read/write operation;and performing the read/write operation to the shared memory on thelocal network interface card when the local network interface card isconfigured with the shared memory supporting said read/write operation;wherein said shared memory is configured to provide shared storage fortasks of a distributed application; and wherein said shared memory isaccessed by a plurality of computing nodes executing a same task.